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Abstract 

In 1997, Z. Zhang and R.W. Yeung found the first example of a con- 
ditional information inequality in four variables that is not "Shannon- 
type". This linear inequality for entropies is called conditional (or con- 
straint) since it holds only under condition that some linear equations are 
satisfied for the involved entropies. Later, the same authors and other 
researchers discovered several unconditional information inequalities that 
do not follow from Shannon's inequalities for entropy. 

In this paper we show that some non Shannon-type conditional in- 
equalities are "essentially" conditional, i.e., they cannot be extended to 
any unconditional inequality. We prove one new essentially conditional 
information inequality for Shannon's entropy and discuss conditional in- 
formation inequalities for Kolmogorov complexity. 



1 Introduction 

Let (Xi, . . . ,X n ) be jointly distributed random variables on a finite domain. 
For this collection of random variables there are 2™ — 1 non-empty subsets and 
for each subset we have a value of Shannon's entropy. We call this family of 
entropies the entropy profile of the distribution (X\, . . . ,X n ). Thus, to every 
n-tuple of jointly distributed random variables there corresponds its entropy 
profile which is a vector of values in R 2 _1 . We say that a point in R 2 _1 is 
entropic if it is a vector of entropies for some distribution. 

All entropic points satisfy different information inequalities that characterize 
the range of all entropies for Xi. The most known and understood are so-called 
Shannon-type inequalities, i.e., linear combinations of basic inequalities of type 
I(U : V|W) > 0, where U,V,W are any (possibly empty) subsets of the given 
family of random variables. 

In 1998 Z. Zhang and R.W. Yeung proved the first example of an uncondi- 
tional non Shannon-type information inequality, which was a linear inequality 
for entropies of (X\, X2, A3, X4) that cannot be represented as a combination 
of basic inequalities [5]. Since this seminal paper of Zhang and Yeung was 
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published, many (in fact, infinitely many) non Shannon-type linear information 
inequalities were proven, see, e.g., [7J |8j |9j [121 H3] • These new inequalities were 
applied in problems of network coding [TJJ, secret sharing [TB], etc. However, 
these inequalities and their 'physical meaning' are still not very well understood. 

In this paper we discuss conditional (constraint) information inequalities. 
That is, we are interested in linear information inequalities that are true only 
given some linear constraint for entropies. Trivial examples of conditional in- 
equalities can be easily derived from (unconditional) basic inequalities, e.g., if 
H(Xi) = then H(Xi,X2) < H{X2). However, some conditional inequalities 
cannot be obtained as a corollary of Shannon-type inequalities. The first exam- 
ple of a nontrivial conditional inequality was proven in [3] (even before the first 
example of an unconditional non Shannon- type inequality): 



if I(A : B) = I (A : B\C) = 0, then 
I(C : D) < I(C : D\A) + I(C : D\B) 

Another conditional inequality 



(1) 



if I(A : B\C) = I(B : D\C) = 0, then 

I(C : D) < I(C : D\A) + I{C : D\B) + I(A : B) 

was proven by F. Matus in [6]. 

In [7] it was conjectured that (JTJ can be extended to some unconditional 
inequality 

I(C:D)<I(C:D\A) + I(C:D\B) + 

+ k(I{A:B) + I{A:B\C)) 1 ' 

(for some constant k > 0). In this paper we prove that this conjecture is 
wrong: for any coefficient k, inequality (|3]) is not true for some distributions. 
So, inequality ([T]) is "essentially conditional" ; it cannot be extended to an un- 
conditional information inequality. A similar statement can be proven for 

In this paper we also prove one new conditional linear inequality that cannot 
be extended to any unconditional inequality. So, now we have three examples 
of essentially conditional linear information inequality. 

It should be noticed that these conditional information inequalities are proven 
for the set of entropic points. (These three inequalities involve 4-tuples of ran- 
dom variables, so technically they are some statements about the set of the 
entropic points in R 15 .) But it is not know whether they hold for the almost 
entropic points (i.e., for the points x £ R 15 such that for every e > there 
exists an entropic point y £ R 15 at the distance less than e from x). In fact, 
the set of the almost entropic points is a nice and interesting object to study. 
In some sense, the almost entropic points make a more natural object than the 
entropic points; e.g., for every n the set of all almost entropic points for n-tuples 
of random variables is a closed convex cone (while for n > 2 the set of all en- 
tropic points is not closed and not a cone). We recall that some piecewise linear 
conditional information inequality proven in [15] holds only for the entropic but 
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not for the almost entropic points. So there is an interesting open question: 
Do inequalities ([l), ((2J, and the inequality from Theorem [2] hold for the almost 
entropic points? 

It is known that the class of unconditional linear information inequalities are 
the same for Shannon's entropy and for Kolmogorov complexity. The situation 
with conditional inequalities is more complicated: the known technique used 
to prove constraint information inequalities for Shannon's entropy cannot be 
directly adapted for Kolmogorov complexity. In fact, it is not even clear how to 
formulate Kolmogorov's version of constraint inequalities. However, we prove 
for Kolmogorov complexities some counterpart of inequality (TTJ); this inequality 
holds only for some special tuples of words. 

The paper is organized as follows. In Section[2]we use the technique from [4] 
and prove one new conditional information inequality. In Section[3]we prove that 
this new inequality as well as ([T]) and © cannot be extended to any uncondi- 
tional inequalities. In Section @] we prove some version of conditional inequality 
for Kolmogorov complexities. 

1.1 Corrected errors 

Some errors were found in the previous versions of the paper: 

• The statement of Theorem 4(b) in arXiv: 1103.2545vl was wrong. 

• The proof of Theorem 4 in arXiv: 1103.2545v2 and arXiv: 103.2545v3 

and in the proceedings of ISIT-2011 was wrong. Note that the statement 
of this theorem (it claims that the cone of asymptotically entropic points 
for 4 random variables is not polyhedral) is true, see [5J. However, the 
"new proof" of this result suggested in our paper was not valid since 
the conditional inequalities under consideration are proven only for the 
entropic but not for the almost entropic points. We thank F. Matiis, who 
pointed out this mistake. 

2 Nontrivial conditional information inequalities 

The very first example of an inequality that does not follow from basic (Shannon 
type) inequalities was the following result of Z. Zhang and R. W. Yeung: 

Theorem 1 (Zhang- Yeung, [J]). For all random variables A, B,C, D, if I (A : 
B\C) =I{A:B)=0 then 

I(C : D) < I(C : D\A) + I(C : D\B). 

With the same technique F. Matiis proved another conditional inequality $2$, 
see [BJ. Using a similar method, we prove one new conditional inequality: 

Theorem 2. For all random variables A,B,C,D if 
H{C\A,B) = I (A : B\C) = 0, 
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then I(C : D) < I(C : D\A) + I(C : D\B) + I (A : B). 



Proof. The argument consists of two steps: enforcing conditional independence 
and elimination of conditional entropy. Let us have a joint distribution of ran- 
dom variables A, B, C, D. The first trick of the argument is a special transforma- 
tion of this distribution: we keep the same distribution of the triples (A, C, D) 
and (B, C, D) but make A and B independent conditional on (C, D). Intuitively 
it means that we first choose at random (using the old distribution) values of C 
and D; then given fixed values of C, D we independently choose at random A 
and B (the conditional distributions of A given (C, D) and B given (C, D) are 
the same as in the original distribution). 

More formally, we construct a new distribution (A,B,C,D). If Prob[A = 
a, B = b,C — c, D = d] is the original distribution, then the new distribution is 
defined as follows: 

Prob[i = a,B = b,C = c,D = d] = 

ProbL4 = a, C = c,D = d}- Pmb[B = b,C = c,D = d] 
Prob[C = c,D = d] 

(with the convention § = for all values a, b, c, d of the four random variables). 
From the construction (A and B are independent given C, D) it follows that 

H{A, B, C, D) = H{C, D) + H(A\C, D) + H(B\C, D) 

Since (A, C, D) and (B, C, D) have exactly the same distributions as the original 
(A, C, D) and (B, C, D) respectively, we have 

H{A, B, C, D) = H(C, D) + H(A\C, D) + H(B\C, D) 

The same entropy can be bounded in another way: 

H{A, B, C, D) < H{D) + H(A\D) + H{B\D) + H(C\A, B) 

Notice that the entropies H(D), H{A\D) and H{B\D) are equal to H(D), 
H(A\D) and H(B\D) respectively (we again use the fact that A,D and B,b 
have the same distributions as A, D and B,D respectively in the original dis- 
tribution). Thus, we get 

H{C, D) + H(A\C, D) + H{B\C, D) < 

H(D) + H(A\D) + H(B\D) + H{C\A, B) 

It remains to estimate the value H(C\A, B). We will show that it is zero (and 
this is the second trick used in the argument). 

Here we will use the two conditions of the theorem. We say that some 
values a, c (respectively, 6, c or a, b) are compatible if in the original distribution 
these values can appear together, i.e., ProbL4 = a,C = c] > (respectively, 
Prob[£ = b,C = c] > or ProbL4 = a,B = b] > 0). Since A and B arc 
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independent given C, if some values a and b (of A and B) are compatible with 
the same value c of C, then these a and b are compatible with each other. 

In the new distribution (A, B, C, D) values of A and B are compatible with 
each other only if they are compatible with one and the same value of C ; hence, 
these values must be also compatible with each other in the original distribution 
(A,B). Further, since H(C\A, B) — 0, for each pair of compatible values of 
A, B there exists only one value of C. Thus, for each pair of values (A, B) with 
probability 1 there exists only one value of C. In a word, in the new distribution 
H{C\A,B) = 0. 

Summarizing our arguments, we get 

H(C, D) + H(A\C, D) + H(B\C, D) < 
H(D)+H(A\D)+H(B\D), 

which is equivalent to 

I(C : D) < I(C : D\A) + I(C : D\B) + I(A : B). 

□ 

The proof of Theorem[2]presented above is based implicitly on non- negativity 
of the Kullbak-Leibler divergence. The same idea can be presented in a slightly 
different form, with an explicit reference to the Kullbak-Leibler inequality. The 
argument is almost the same as the proof of the second part of Proposition 2.1 
in©: 

Second version of the proof of Theorem^ Let p[a,b, c, d] be a distribution of 
{A,B,C,D) such that H{C\AB) = I(A : B\C) = 0. With some abuse of 
notations for we denote projections of this distribution as 

p[a, c, d] = Prob[A = a, C = c, D = d\, p[a, d] = ProbL4 = a, D = d], etc. 

We construct two new distributions, p[a, b, c, d] = ProbL4 = a, B = b,C = 
c, D = d], and p[a, b, c, d] = ProbL4 = a, B = b, C = c, D = d]. We define them 
as follows: 

p[a, c, d] ■ p[b, c, d] 
p[a, b, c, d] = - 



and 



p[c, d] 



( p[a,d]-p[b,d] -r r . l . n 

p[a,b,c,d} = \ poW • *P[*M>Q, 
0, otherwise. 



Since I(A : B\C) = 0, the condition p[a, b,c] > is true if and only if p[a, c] > 
and p[b, c] > 0. 

Then we use non-negativity of the Kullback-Leibler divergence: 

< D(p\\p) = V P[" ; c : rf ] -P[b,c,d] _ log P\a, c, d] ■ p\b, c, d] ■ p[d] 
^ P[c,d] p[c,d] ■ p[a,d] ■ p[b,d] 
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(the sum over all values a, 6, c, d such that p[a, b, c] > 0). It follows immediately 
that 

< H(A, D) + H(B, D) + H(C, D) - H(A, C, D) - H(B, C, D) - H{D). 

Now we add the values I(B : C\A) = H(A, C) + H(B, C) - H(A, B, C) - H(C) 
and H(C\A, B) = H(A, B, C) - H{A, B) to the right-hand side of the inequality 
(both these values are equal to for our distribution). We obtain 

< I(C : D\A) + I(C : D\B) + I(A : B) - I(C : D), 

and we are done. □ 



3 Conditional inequalities that cannot be ex- 
tended to any unconditional inequalities 

In [7] it was conjectured that the conditional inequality from Theorem [T] is a 
corollary of some unconditional information inequality (which was not discov- 
ered yet): 

Conjecture 1 ([7]). For some constant k > inequality (0i is true for all 
random variables A, B,C,D. 

Obviously, if such an inequality could be proven, it would imply the statement 
of Theorem [TJ Similar conjectures could be formulated for (|2|) and the condi- 
tional inequality from Theorem [2] We prove that these conjectures are false, 
i.e., these three conditional inequalities cannot be converted into unconditional 
inequalities: 

Theorem 3. (a) For any k the inequality (0) is not true for some distributions 
(A,B,C,D). 

(b) For any k the inequality 

I(C : D) < I(C : D\A) + I(C : D\B) + I(A : B) + 

+ k(I{A:B\C)+H(C\A,B)) (4) 

is not true for some distributions (A,B,C,D). 

(c) For any k the inequality 

I(C : D) < I(C : D\A) + I(C : D\B) + I(A : B)+ 

+ k{I(A: B\C) + H{B : D\C)) ^' 

is not true for some distributions (A, B,C, D). Thus, 0) cannot be extended to 
an unconditional inequality. 
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Proof, (a) For all e <E [0, 1] we us consider the following joint distribution of 
binary variables (A, B, C, D): 



ProbL4 = 


0, B 


= o, 


c = 


0, B = 


1] 


= (1 


-e)/4, 


ProbjU = 


0, B 


= 1, 


c = 


0, D = 


0] 


= (1 


-e)/4, 


¥iob[A = 


1, B 


= o, 


c = 


0, D = 


1] 


= (1 


-e)/4, 


Piob[A = 


1, B 


= 1, 


c = 


0, B = 


1] 


= (1 


-e)/4, 


Prob[A = 


1, B 


= 0, 


c = 


1, B = 


1] 




e. 



For each value of A and for each values of B, the value of at least one of variables 
C,D is uniquely determined: if A = then C = 0; if A = 1 then I? = 1; if 
B = then D = V, and if B = 1 then C = 0. Hence, 7(C : B|A) = 7(C : 
B|B) = 0. Also it is easy to see that I(A : B\C) = 0. Thus, if © is true, then 
I(C : D) < kI(A : B). 

Denote the right-hand and left-hand sides of this inequality by L(e) = I(C : 
D) and R(e) = kI(A : B). Both functions L(e) and R(e) are continuous, and 
L(0) = B(0) = (for e = both sides of the inequality are equal to 0). However 
the asymptotics of L(e) and R(e) as e — > arc different: it is not hard to check 
that L{e) = 6(e), but R(e) = 0{e 2 ). From © we have 6(e) < 0(e 2 ), which is 
a contradiction. 

(b) For every value of e G [0,1] we consider the following joint distribution 
of binary variables (A, B, C, D): 

ProbL4 = 1, B = 1, C = 0, D = 0] = 1/2 - e, 

Prob[A = 0, B = 1, C = 1, B = 0] = e, 

Prob[A = 1, B = 0, C = 1, D = 0] = e, 

Prob[A = 0, B = 0, C = 1, B = 1] = 1/2 - e. 

The argument is similar to the proof if (a). First, it is not hard to check that 
I(C : D\A) = I(C : D\B) = H{C\AB) = for every e. Second, 

I[A:B) = l + (2-2/ln2)e + 2£loge + 0(e 2 ), 
I(C:D) = l + (4-2/ln2)e + 2£loge + 0(e 2 ), 

so I(A : B) and I(C : D) both tend to 1 as £ — > 0, but their asymptotics are 
different. Similarly, 

I(A : B\C) - 0{e 2 ). 

It follows from (g} that 

2£ + 0(£ 2 ) < 0(£ 2 ) + 0(K£ 2 ), 

and with any k we get a contradiction for small enough e. 

(c) For the sake of contradiction we consider the following joint distribution 
of binary variables (A, B, C, D) for every value of £ £ [0, 1]: 



ProbLA = 


0, B 


= 0, C = 


0, B = 


0] 


= 3c, 


Prob[^ = 


1, B 


= 1, C = 


0, B = 


0] 


= 1/3 -£ 


Prob[^l = 


1, B 


= 0, c = 


1, B = 


0] 


= 1/3 -£ 


Prob[^l = 


0, B 


= 1, c = 


0, B = 


1] 


= 1/3 -£ 
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We substitute this distribution in ([5]) and obtain 

Jo + 0(e) < Jo + 3eloge + 0(e) + 0(«e), 

where Jo is the mutual information between C and D for e = (which is equal 
to the mutual information between A and B for e = 0). We get a contradiction 
as e ->• . □ 



4 Constraint inequality for Kolmogorov complex- 
ity 

Kolmogorov complexity of a finite binary string X is defined as the length of 
the shortest program that generates X; similarly, Kolmogorov complexity of a 
string X given another string Y is defined as the length of the shortest program 
that generates X given Y as an input. More formally, for any programming 
language L, Kolmogorov complexity Kl(X\Y) is defined as 

Kl(X\Y) = min{|p| : program p prints X on input Y}, 

and unconditional complexity K^{X) is defined as complexity of X given the 
empty Y . The basic fact of Kolmogorov complexity theory is the invariance 
theorem: there exists a universal programming language U such that for any 
other language L we have K V (X\Y) < K L (X\Y) + 0(1) (the O(l) depends 
on L but not on X and Y). We fix such a universal language U; in what 
follows wc omit the subscript U and denote Kolmogorov complexity by K(X), 
K(X\Y). We refer the reader to an excellent book [TU] for a survey of properties 
of Kolmogorov complexity. 

Kolmogorov complexity was introduced in [5] as an algorithmic version of 
measure of information in an individual object. In some sense, properties of 
Kolmogorov complexity are quite similar to properties Shannon's entropy. For 
example, for the property of Shannon's entropy H(A,B) = H(A) + H(B\A) 
there is a Kolmogorov's counterpart 

K(A, B) = K(A) + K(B\A) + 0(\ogK(A, B)) (6) 

(the Kolmogorov-Levin theorem, [3]). This result justifies the definition of the 
mutual information, which is an algorithmic version of the standard Shannon's 
definition: the mutual information is defined as I(A : B) := K(A) + K(B) — 
K(A, B), and the conditional mutual information is defined as 

I(A : B\C) := K(A, C) + K(B, C) - K(A, B, C) - K{C). 

From the Kolmogorov-Levin theorem it follows that I(A : B) is equal to 
K{A) — K(A\B), and the conditional mutual information I(A : B\C) is equal to 
K(A\C) — K(A\B, C) (all these equations hold only up to logarithmic terms). 

In fact, we have a much more deep and general parallel between Shannon's 
and Kolmogorov's information theories; for every linear inequality for Shannon's 
entropy there exists a Kolmogorov's counterpart: 
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Theorem 4 f For each family of coefficients {Aw} the inequality 

^2 \H{ai) + ^2 XijH(ai, ay) + . . . > 

i i<j 

is true for every distribution {a^} if and only if for some constant C the in- 
equality 

\K{ ai ) + J2 \j K ( a u a 3 ■ ) + . . . C log N > 

i i<j 

is true for all tuples of strings {a,}, N = K(a\, a%, . . .) (C does not depend on 
di). 

Thus, the class of unconditional inequalities valid for Shannon's entropy 
coincides with the class of (unconditional) inequalities valid for Kolmogorov 
complexity. What about conditional inequalities? 

In the framework of Kolmogorov complexity we cannot say that some infor- 
mation quantity exactly equals zero. Indeed, even the definition of Kolmogorov 
complexity makes sense only up to an additive term that depends on the choice of 
the universal programming language. Moreover, such a natural basic statement 
as the Kolmogorov-Levin theorem ((BJ) holds only up to a logarithmic term. So, 
if we want to prove a sensible conditional inequality for Kolmogorov complex- 
ity, the linear constraints must be formulated with some reasonable precision. 
A natural version of Theorem Q] is the following conjecture: 

Conjecture 2. There exist functions f(n) and g(n) such that f{n) = o{n) and 
g(n) = o(n), and for all strings A, £?, C, D satisfying I (A : B\C) < f(N), I (A : 
B) < f(N) it holds I(C : D) < I(C : D\A) + I(C : D\B) + g(N) (where 
N = K(A, B, C, D) ). 

There is no hope to prove Conjectured] with f(n) and g(n) of order 0(logn). 
Indeed, using a counterexample from the proof of Theorem EJa) , we can con- 
struct binary strings A,B,C,D such that the quantities I{A : B\C), I (A : £>), 
I{C : D\A), and I(C : D\B) arc bounded by O (log AT), but I(C : D) = 
Q(\/]V log N). However, even if Conjecture [2] is false in general, similar condi- 
tional inequalities (even with logarithmic precision) can be true for some special 
tuples A, B, C, D. In what follows we show how to prove such an inequality for 
one natural example of strings A, B, C (and any D). 

Let F n be the finite field of 2" elements. We consider the affine plane over 
F„. Let C be random line in this plane, and A and B be two points incident 
to this line. To specify the triple (A, B, C) we need at most An + 0(1) bits 
of information: a line in a plane can be specified by two parameters in F„; to 
specify each point in a given line we need additional n bits of information. 

We take a triple of strings (A, B, C) as specified above with maximal possible 
Kolmogorov complexity, i.e., such that K(A, B 7 C) = An + 0(1) (it follows from 
a simple counting argument that such a triple exists; moreover, there are about 
2 4 ™+°( 1 ) such triples). For these A, B and C we can easily estimate all their 
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Kolmogorov complexities: 



K(A), K(B), and K(C) are equal to 2n + 0(1), 
K(A, C) = 3n + 0(1), K(B, C) = 3n + 0(1), 
H(A, B) = An + 0(1). 

For this triple of strings the quantities I(A : B) and I(A : B\C) are negli- 
gible (logarithmic). This condition is very similar to the condition on random 
variables A, B, C in Theorem[TJ So, it is not very surprising that Kolmogorov's 
counterpart of Theorem [T] holds for these strings: 

Proposition 1. For the strings A,B,C defined above and for all strings D we 
have 

I(C : D) < I(C : D\A) + I(C : D\B) + 0(\ogN), 
where N = K(A,B,C, D). 

This statement can be proven by an argument similar to the proof of Theo- 
rem [5] Let us explain this argument in full detail. 

Proof. We may identify C with a linear function c\x + C2 over F„, where c\ 
and c 2 are elements of the field (since Kolmogorov complexity of C is large, it 
cannot be a vertical line on the plane). Further, the points A and B in this line 
can be represented as pairs (oi, a 2 ) and (61, 62) such that 

Ci • ai + c 2 = a 2 and ci ■ b\ + c 2 = b 2 

(here ai and bi are also elements of F„). By assumption, complexity of the pair 
(A,B) is close to 4n. It means that A 7^ B; hence, a\ 7^ b\. Let i be one of 
indexes such that the iih bits of a\ and &i are different. W.l.o.g. we assume 
that the ith bit in a\ is equal to and the ith bit in 61 is equal to 1. 

Now we split the affine plane over F n into two halves: Pq will consist of all 
points (x,y) such that the ith bit of x is 0, and P\ will consist of the points 
(x, y) such that the ith bit of x is 1. So, point A = (ai, a 2 ) belongs to Pq, and 
B = (bi,b 2 ) belongs to Pi. 

Now we are going to variate the points A and B: we will substitute A and 
B by their 'clones' A 1 and B' so that the triples (A',B',C) remain "similar" to 
the initial one (A, B, C). More precisely, we say that A' is a clone of A if 

• A' = (0^,02) is a point in line C, and A' G Pq (i.e., c\ ■ a[ + c 2 = a' 2 , and 
the ith bit of a[ is equal to 0); 

• complexities K(A'), K(A',C), K(A',D), and K(A',C,D) are equal (up 
to an additive term O(logA^)) to the corresponding complexities K(A), 
K(A, C), K(A, D), and K(A, C, D). 

Similarly, we say that B' is a clone of B if 

• B' = (b[, b' 2 ) is a point in line C, and B' G Pi, and 
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• complexities K(B'), K(B',C), K(B',D), and K(B',C,D) are equal (up 
to an additive term 0(\ogN)) to the corresponding complexities K(B), 
K(B,C), K(B,D), and K(B,C,D). 

From a simple counting argument it follows that there exist 2- R "( j4 l C '- D ' -0 ( log N *> 
different clones of A and 2 Ki - B \ c ^-°^ N 1 clones of B (see, e.g., [TTJ Lemma 2] 
or [171 Lemmas 1-2]). 

Let us take a pair of clones A' and B' with maximal complexity given (C, D). 
Then 

K(A',B',C,D) = 

K(C,D)+K(A'\CD)+K(B'\CD)+0{\ogN) = 
K(C, D) + K(A\C, D) + K(B\C, D) + 0(log N) 

On the other hand, 

K(A',B',C,D) < K(D)+ 

K{A'\D) + K{B'\D) + K(C\A',B') + O(logiV) 

By definition of clones, complexities K(A'\D) and K(B'\D) are equal (up to 
O(logTV) term) to K(A\D) and K(B\D) respectively. Since A 1 and B' belong 
to Pq and Pi respectively, they cannot be equal to each other. Hence, A' and 
B' uniquely determine line C. So, we get 

K(C, D) + K(A\CD) + K{B\CD) < 

K(D) + K(A\D) + K(B\D) + O(logiV), 

which is equivalent (by the Kolmogorov-Levin theorem) to 

I(C : D) < I(C : D\A) + I(C : D\B) + 0(log N). 

□ 
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